xen.git
16 years agopygrub: expands tabs before displaying menus.
Keir Fraser [Mon, 23 Nov 2009 07:17:32 +0000 (07:17 +0000)]
pygrub: expands tabs before displaying menus.

Otherwise the highlighting and line length trimming does not work as
expected and the display appears corrupted.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
16 years agopygrub: if default entry is "saved" then use first entry.
Keir Fraser [Mon, 23 Nov 2009 07:17:10 +0000 (07:17 +0000)]
pygrub: if default entry is "saved" then use first entry.

pygrub doesn't support the "savedefault" command and will error out if
menu.lst uses the "default saved" directive. We might as well start on
the first entry in this case instead of failing.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
16 years agoxsm: Change format strings from signed to unsigned
Keir Fraser [Mon, 23 Nov 2009 07:16:23 +0000 (07:16 +0000)]
xsm: Change format strings from signed to unsigned
...to reflect the variables being passed in.

Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>

16 years agopcifront: fix multiple initialization bug
Keir Fraser [Mon, 23 Nov 2009 07:14:33 +0000 (07:14 +0000)]
pcifront: fix multiple initialization bug

Now that we have pcifront_watches to dynamically initialize pcifront
we don't need a call to init_pcifront in pcilib and pcifront_scan
anymore; we should just wait for the frontend to connect to the
backend instead.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxc: Minor tools bzip2/lzma decompression fixes
Keir Fraser [Mon, 23 Nov 2009 07:13:59 +0000 (07:13 +0000)]
libxc: Minor tools bzip2/lzma decompression fixes

The attached patch cleans up a few minor problems in the bzip2/lzma
decompression support, pointed out by Jiri in internal review.  In
particular, it fixes a possible memory leak on realloc() error, it
fixes a shifting typo, and it changes the xc_dom_printf()'s to be a
bit clearly about which compression routine is in-use.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
16 years agoxend: Add support for XCP Windows PV drivers
Keir Fraser [Mon, 23 Nov 2009 07:12:06 +0000 (07:12 +0000)]
xend: Add support for XCP Windows PV drivers

This patch adds support for XCP Windows paravirtual drivers to run on
Xen. The drivers are currently provided in binary-only format from
Citrix. At a minimum, this patch is useful for performance comparisons
vs GPLPV drivers.

Live migration and save/resume are functional but set the guest clock
to the 1970's. The clock must be manually adjusted for the guest's ntp
to resume accurate timekeeping.

Before rebooting windows at the end of driver installation create the
registry key
HKLM\System\CurrentControlSet\Services\xenevtchn\Parameters. Add to it
a DWORD called SetFlags with a value of 0x10000000.

Signed-off-by: Keith Coleman <keith@scaltro.com>
16 years agoRemus: remove Py_RETURN_NONE for Python 2.3
Keir Fraser [Mon, 23 Nov 2009 07:10:56 +0000 (07:10 +0000)]
Remus: remove Py_RETURN_NONE for Python 2.3

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years agoRemus: fix a warning
Keir Fraser [Mon, 23 Nov 2009 07:07:08 +0000 (07:07 +0000)]
Remus: fix a warning

This patch fixes the following warning:
  xen/lowlevel/checkpoint/libcheckpoint.c: In function
  `delete_suspend_timer':
  xen/lowlevel/checkpoint/libcheckpoint.c:352: warning: assignment
  makes integer from pointer without a cast

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years agolibxenlight: fix compilation error for ia64
Keir Fraser [Mon, 23 Nov 2009 07:06:39 +0000 (07:06 +0000)]
libxenlight: fix compilation error for ia64

xc_cpuid_apply_policy() and HVM_PARAM_VIRIDIAN are defined on x86
only.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years ago[IA64] Remus: ia64 counter part of 07f6d9047af4
Keir Fraser [Mon, 23 Nov 2009 07:06:10 +0000 (07:06 +0000)]
[IA64] Remus: ia64 counter part of 07f6d9047af4

This patch adds callbacks to xc_domain_save().

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years agodocs: descriptions of PSCSI_HBA and DSCSI_HBA
Keir Fraser [Mon, 23 Nov 2009 07:05:34 +0000 (07:05 +0000)]
docs: descriptions of PSCSI_HBA and DSCSI_HBA

Add descriptions of PSCSI_HBA class and DSCSI_HBA class to XenAPI
document.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agolibxenlight: fix memory leaks
Keir Fraser [Mon, 23 Nov 2009 07:04:54 +0000 (07:04 +0000)]
libxenlight: fix memory leaks

In particular:

- all the temporary flexarrays allocated in the create
  device functions must be freed;

- all the strings that don't need to be modified can be added as they
  are
  to these temporary flexarrays instead of duplicating them;

- any data returned to the user shouldn't be added to the global
  memory tracker so that the user can free it whenever he wishes.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoVT-d: Call pci_enable_acs() in pci_add_device_ext()
Keir Fraser [Mon, 23 Nov 2009 07:03:01 +0000 (07:03 +0000)]
VT-d: Call pci_enable_acs() in pci_add_device_ext()

Signed-off-by: Allen Kay allen.m.kay@intel.com
16 years agolibxenlight: check for early failures of qemu-dm
Keir Fraser [Mon, 23 Nov 2009 07:01:51 +0000 (07:01 +0000)]
libxenlight: check for early failures of qemu-dm

This patch makes xl create check whether qemu-dm has started
correctly, and causes it to fail immediately with appropriate errors
if not.  There are other bugfixes too.

More specifically:

 * libxl_create_device_model forks twice rather than once so that the
   process which calls libxl does not end up being the actual parent
   of qemu.  That avoids the need for the qemu-dm process to be reaped
   at some indefinite time in the future.

 * The first fork generates an intermediate process which is
   responsible for writing the qemu-dm pid to xenstore and then merely
   waits to collect and report on qemu-dm's exit status during
   startup.  New arguments to libxl_create_device_model allow the
   preservation of its pid so that a later call can check whether the
   startup is successful.

 * The core of this functionality (the double fork, waitpid, signal
   handling and so forth) is abstracted away into a new facility
   libxl_spawn_... in libxl_exec.c.

Consequential changes:

 * libxl_wait_for_device_model now takes a callback function parameter
   which is called repeatedly in the loop iteration and allows the
   caller to abort the wait.

 * libxl_exec no longer calls fork; there is a new libxl_fork.

 * There is a hook to override waitpid, which will be necessary for
   some callers.

Remaining problems and other issues I noticed or we found:

 * The error handling is rather inconsistent still and lacking in
   places.

 * destroy_device_model can kill random dom0 processes (!)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
16 years agolibxenlight: correct broken osdeps.[ch] and make #includes consistent
Keir Fraser [Mon, 23 Nov 2009 07:00:08 +0000 (07:00 +0000)]
libxenlight: correct broken osdeps.[ch] and make #includes consistent

osdeps.[hc] previously mistakenly declared and defined [v]asprintf.
These functions are available in the libc on most platforms.  Also,
osdeps.h is used by xc.c but xc.c is not part of the library, so
osdeps.h is part of the public interface and should have a better
name.

So now, instead:

 * osdeps.h is libxl_osdeps.h.

 * _GNU_SOURCE is #defined in libxl_osdeps.h so that we get the system
   [v]asprintf (and various other functions)

 * libxl_osdeps.h is included first in every libxl*.c file (it needs
   to be before any system headers so that _GNU_SOURCE) takes effect.

 * osdeps.[hc] only provide their own reimplementation of [v]asprintf
   if NEED_OWN_ASPRINTF is defined.  Currently it is not ever defined
   but this is provided for any platform which needs it.

 * While I was editing the #includes in each .c file, I put them all
   into the same order: "libxl_osdeps.h", then system headers,
   then local headers.

 * xs.h is included in libxl.h.  This is needed for "bool"; it has to
   not be typedefed in libxl.h because otherwise we get a duplicate
   definition when including xs.h.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agolibxenlight: Clean up logging arrangements
Keir Fraser [Mon, 23 Nov 2009 06:59:06 +0000 (06:59 +0000)]
libxenlight: Clean up logging arrangements

* Introduce new variants of the logging functions which include
  errno values (converted using strerror) in the messages passed to
  the
  application's logging callback.

* Use the new errno-including logging functions everywhere where
  appropriate.  In general, xc_... functions return errno values or 0;
  xs_... functions return 0 or -1 (or some such) setting errno.

* When libxl_xs_get_dompath fails, do not treat it as an allocation
  error.  It isn't: it usually means xenstored failed.

* Remove many spurious \n's from log messages.  (The applications log
  callback is expected to add a \n if it wants to do that, so libxl's
  logging functions should be passed strings without \n.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
16 years agox86: enable directed EOI
Keir Fraser [Mon, 23 Nov 2009 06:58:19 +0000 (06:58 +0000)]
x86: enable directed EOI

This patch enables directed EOI on latest processor. With this, the
broadcast of EOI would be suppressed upon LAPIC EOI, so VMM is
required to perform a directed EOI to the IOxAPIC generating the
interrupt by writting to its EOI register.(Pls. refer SDM 3A 10.5.5)

This is useful for ioapic_ack_old to avoid the spurious interrupt
storm, which is the reason why ioapic_ack_new is used.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
16 years agovt-d: enable PCI ACS P2P upstream forwarding
Keir Fraser [Mon, 23 Nov 2009 06:56:01 +0000 (06:56 +0000)]
vt-d: enable PCI ACS P2P upstream forwarding

This patch enables P2P upstream forwarding in ACS capable PCIe
switches.  The enabling is conditioned on iommu_enabled variable.
This code solves two potential problems in virtualization environment
where a PCIe device is as signed to a guest domain using a HW iommu
such as VT-d:

1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch.  This causes the
PCI transaction to go to the matching downstream port instead of go to
the root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.

Corresponding ACS filtering code is already in upstream control panel
code that do not allow PCI device passthrough to guests if it is
behind a PCIe switch that does not have ACS capability or with ACS
capability but is not enabled.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
16 years agolibxenlight: implement support for pv guests
Keir Fraser [Mon, 23 Nov 2009 06:54:03 +0000 (06:54 +0000)]
libxenlight: implement support for pv guests

This patch makes pv guest work correctly with libxenlight.  It also
implements support for vfb and vkbd, starting qemu in xenpv mode. Both
xenconsole and qemu are supported as console backends.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxend: Remove tab indents
Keir Fraser [Mon, 23 Nov 2009 06:52:35 +0000 (06:52 +0000)]
xend: Remove tab indents

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agotmem: fix double-free bug
Keir Fraser [Mon, 23 Nov 2009 06:48:14 +0000 (06:48 +0000)]
tmem: fix double-free bug

Tmem double-frees a high-level data structure causing memory
corruption under certain circumstances.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agorombios: don't busy-wait for keystrokes
Keir Fraser [Mon, 23 Nov 2009 06:47:29 +0000 (06:47 +0000)]
rombios: don't busy-wait for keystrokes

Spinning waiting for the keyboard is a bit rude on a virtual
machine. Wait for an interrupt instead.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agotmem: printk too chatty when tmem enabled
Keir Fraser [Mon, 23 Nov 2009 06:46:58 +0000 (06:46 +0000)]
tmem: printk too chatty when tmem enabled

Two gdprintk's that are rarely encountered with tmem disabled
are frequent but meaningless when tmem is enabled.  Printing
these tens-to-hundreds of times per second (in certain
circumstances even higher) slows down domain execution.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agotmem: fix regression from c/s 19886 "Remove page-scrub lists and async scrubbing"
Keir Fraser [Mon, 23 Nov 2009 06:45:03 +0000 (06:45 +0000)]
tmem: fix regression from c/s 19886 "Remove page-scrub lists and async scrubbing"

Fix incorrect page_list macro choice from page-scrub code cleanup.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agox86 shadow: Relax assertion in VRAM tracking code
Keir Fraser [Mon, 23 Nov 2009 06:43:50 +0000 (06:43 +0000)]
x86 shadow: Relax assertion in VRAM tracking code

The original assertion is too strict, as it includes the A/D bits of
the PTE, which (by design) can change under our feet.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoblktap2: fix libgcrypt detection
Keir Fraser [Mon, 23 Nov 2009 06:42:12 +0000 (06:42 +0000)]
blktap2: fix libgcrypt detection

If we want to check the functionality of libgcrypt, we shouldn't test
a function only exported by openssl, but instead the one actually used
in the code.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
16 years agoRevert 20437:64599a2d310d
Keir Fraser [Tue, 17 Nov 2009 13:07:16 +0000 (13:07 +0000)]
Revert 20437:64599a2d310d

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoblktap2: Remove uninitialised variable rc from tdremus_close().
Keir Fraser [Tue, 17 Nov 2009 08:05:52 +0000 (08:05 +0000)]
blktap2: Remove uninitialised variable rc from tdremus_close().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotmem: fix domain shutdown problem/race
Keir Fraser [Sat, 14 Nov 2009 10:32:59 +0000 (10:32 +0000)]
tmem: fix domain shutdown problem/race

Tmem fails to put_domain so a dying domain never gets
properly shut down.  Also, fix race condition when
domain is dying by not allowing any new ops to succeed.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoxend: Remove extraneous logging from pyxc_physinfo().
Keir Fraser [Sat, 14 Nov 2009 10:25:19 +0000 (10:25 +0000)]
xend: Remove extraneous logging from pyxc_physinfo().

Also fixes 32-bit build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Balloon down memory to achive enough DMA32 memory for PV guests
Keir Fraser [Sat, 14 Nov 2009 08:09:50 +0000 (08:09 +0000)]
xend: Balloon down memory to achive enough DMA32 memory for PV guests
with PCI pass-through to succesfully launch.

If the user hasn't used dom0_mem=3D bootup parameter, the privileged
domain usurps all of the memory. During launch of PV guests with PCI
pass-through we ratchet down the memory for the privileged domain to
the required memory for the PV guest. However, for PV guests with PCI
pass-through we do not take into account that the PV guest is going to
swap its SWIOTLB memory for DMA32 memory - in fact, swap 64MB of
it. This patch balloon's down the privileged domain so that there are
64MB of DMA32 memory available.

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agostubdom: Fix up pciutils.patch
Keir Fraser [Fri, 13 Nov 2009 22:13:59 +0000 (22:13 +0000)]
stubdom: Fix up pciutils.patch

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxsm: Dynamic update to device ocontexts
Keir Fraser [Fri, 13 Nov 2009 22:00:19 +0000 (22:00 +0000)]
xsm: Dynamic update to device ocontexts

Added the ability to add and delete ocontexts dynamically on a running
system.  Two new commands have been added to the xsm hypercall, add
and delete ocontext.  Twelve new library functions have been
implemented that use the hypercall commands to label and unlabel
pirqs, PCI devices, I/O ports and memory.  The base policy has been
updated so dom0 has the ability to use the hypercall commands by
default.  Items added to the list will not be present next time the
system reloads.  They will need to be added to the static policy.

Signed-off-by : George Coker <gscoker@alpha.ncsc.mil>
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>

16 years agoxen: allow stubdom to call unmap_domain_pirq
Keir Fraser [Fri, 13 Nov 2009 21:59:20 +0000 (21:59 +0000)]
xen: allow stubdom to call unmap_domain_pirq

there is one missing IS_PRIV/IS_PRIV_FOR change in xen to make
xc_physdev_unmap_pirq work with stubdoms.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agopcifront: implement dynamic connections and disconnections
Keir Fraser [Fri, 13 Nov 2009 21:58:30 +0000 (21:58 +0000)]
pcifront: implement dynamic connections and disconnections

this patch implements dynamic connections and disconnections in
pcifront.
This feature is required to properly support pci hotplug, because when
no pci devices are assigned to a guest, xend will remove the pci
backend altogether.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxend: call xc_assign_device for all the devices to hotplug
Keir Fraser [Fri, 13 Nov 2009 21:54:44 +0000 (21:54 +0000)]
xend: call xc_assign_device for all the devices to hotplug

this patch fixes a couple of issues with pci passthrough in xend,
previously reported by Cui Dexuan.
The main problem is that xc_assign_device is called only for the first
device hotplugged into the guest and not the followings.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoremus: Add missing python __init__.py file
Keir Fraser [Fri, 13 Nov 2009 21:09:33 +0000 (21:09 +0000)]
remus: Add missing python __init__.py file
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoremus: Add missing unistd.h include from libcheckpoint.c
Keir Fraser [Fri, 13 Nov 2009 17:21:13 +0000 (17:21 +0000)]
remus: Add missing unistd.h include from libcheckpoint.c

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoremus: Fix makefiles for indentation
Keir Fraser [Fri, 13 Nov 2009 17:02:25 +0000 (17:02 +0000)]
remus: Fix makefiles for indentation
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoMerge
Keir Fraser [Fri, 13 Nov 2009 15:46:58 +0000 (15:46 +0000)]
Merge

16 years agovtd: Make vtd faults dmesg more readable
Keir Fraser [Fri, 13 Nov 2009 15:38:57 +0000 (15:38 +0000)]
vtd: Make vtd faults dmesg more readable

This simple patch makes the VTd faults dmesg more readable and
helpful for debugging.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
16 years agoRemus: support for network buffering
Keir Fraser [Fri, 13 Nov 2009 15:34:46 +0000 (15:34 +0000)]
Remus: support for network buffering

This currently relies on the third-party IMQ patch (linuximq.net)
being present in dom0. The plan is to replace this with a direct hook
into netback eventually.

This patch includes a pared-down and patched copy of ebtables to
install IMQ on a VIF.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: add control script to activate remus on a VM
Keir Fraser [Fri, 13 Nov 2009 15:34:03 +0000 (15:34 +0000)]
Remus: add control script to activate remus on a VM

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: add python control extensions
Keir Fraser [Fri, 13 Nov 2009 15:33:37 +0000 (15:33 +0000)]
Remus: add python control extensions

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agox86: Change the interface physdev_map_pirq to support new dom0.
Keir Fraser [Fri, 13 Nov 2009 15:31:45 +0000 (15:31 +0000)]
x86: Change the interface physdev_map_pirq to support new dom0.

It also keeps compatibility with old dom0.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agolibxenlight: implement pci passthrough
Keir Fraser [Fri, 13 Nov 2009 15:31:16 +0000 (15:31 +0000)]
libxenlight: implement pci passthrough

This patch implements pci passthrough (hotplug and coldplug) in
libxenlight, it also adds three new commands to xl: pci-attach,
pci-detach and pci-list.
Currently flr on a device is done writing to
/sys/bus/pci/drivers/pciback/do_flr
pciback do_flr is present in both XCI and XCP 2.6.27 kernels.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: fix name to domid conversion
Keir Fraser [Fri, 13 Nov 2009 15:30:24 +0000 (15:30 +0000)]
libxenlight: fix name to domid conversion

This patch makes sure that the domain name to domid conversion is
correct, cross referencing the information found on xenstore with the
list of running domains.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: Disable spinlock checks temporarily while bringing a CPU online.
Keir Fraser [Thu, 12 Nov 2009 15:34:37 +0000 (15:34 +0000)]
x86: Disable spinlock checks temporarily while bringing a CPU online.

This is safe, as described in a code comment. Also fix up another
comment in start_secondary() while we're there.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoDon't assume vcpu_id's are contiguous in alloc_vcpu
Keir Fraser [Thu, 12 Nov 2009 13:15:40 +0000 (13:15 +0000)]
Don't assume vcpu_id's are contiguous in alloc_vcpu

When cpu hot-added, this assumption is broken because the hot-added
CPU may be brougt online by dom0 in arbitrary order. This patch avoids
making this assumption while still linking vcpus in ascending order of
identifier.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoRevert 20045:db1890f07661 "Revert alloc_idle_vcpu()..."
Keir Fraser [Thu, 12 Nov 2009 13:02:27 +0000 (13:02 +0000)]
Revert 20045:db1890f07661 "Revert alloc_idle_vcpu()..."

The old implementation of alloc_idle_vcpu() is unnecessary since
arch-specific code ensures that a single idle domain supports NR_CPUS
vcpus, despite the usual limit of MAX_VIRT_CPUS for ordinary domains.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Remove non-CONFIG_HOTPLUG_CPU code, and general cleanup.
Keir Fraser [Thu, 12 Nov 2009 11:59:18 +0000 (11:59 +0000)]
x86: Remove non-CONFIG_HOTPLUG_CPU code, and general cleanup.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoSupport physical CPU hot-add in xen hypervisor
Keir Fraser [Thu, 12 Nov 2009 11:43:21 +0000 (11:43 +0000)]
Support physical CPU hot-add in xen hypervisor

This patch add CPU hot-add in system.
a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is
set. BTW, this will increase per_cpu area.

b) When a CPU is added through hypercall, the CPU will be marked as
present and offline, and the numa information is setup if numa is
supported. The CPU will be brought to online by dom0 online explicitly.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
16 years agoUpdate pcpu_info hypercall interface
Keir Fraser [Thu, 12 Nov 2009 11:42:36 +0000 (11:42 +0000)]
Update pcpu_info hypercall interface

This patch change the XENPF_get_cpuinfo interface to pass only one
pcpu information each hypercall. Also, it replace
xenpf_resource_hotplug with XENPF_cpu_online/offline.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
16 years agoA few trivial cleanups
Keir Fraser [Thu, 12 Nov 2009 11:42:02 +0000 (11:42 +0000)]
A few trivial cleanups

Alphabetize object files and guest config options for better
readability. Also remove svm interrupt prototypes which do not
exist.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoxend/xm: Add PSCSI_HBA class and DSCSI_HBA class to XenAPI
Keir Fraser [Thu, 12 Nov 2009 11:40:44 +0000 (11:40 +0000)]
xend/xm: Add PSCSI_HBA class and DSCSI_HBA class to XenAPI

XenAPI (not xapi) has supported only LUN assignment mode for pvSCSI.
But at last, HOST assignment mode also is supported by these patches.
To support HOST assignment mode, these patches add PSCSI_HBA class
and DSCSI_HBA class to XenAPI.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoPoD: Handle operations properly when domain is dying
Keir Fraser [Thu, 12 Nov 2009 11:39:51 +0000 (11:39 +0000)]
PoD: Handle operations properly when domain is dying

No populate-on-demand activities should happen when a domain is dying.
Especially, it is a bug for memory to be added to the PoD cache when
d->is_dying is non-zero, since if this happens after the cache has
been emptied, these pages will never be freed. This may cause "zombie
domains" to linger.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoblktap2: Remove gnu89-inline option from CFLAGS
Keir Fraser [Wed, 11 Nov 2009 13:11:44 +0000 (13:11 +0000)]
blktap2: Remove gnu89-inline option from CFLAGS

Not supported by older versions of gcc.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoMark CPU present when it is detected
Keir Fraser [Tue, 10 Nov 2009 13:04:45 +0000 (13:04 +0000)]
Mark CPU present when it is detected

Currently a CPU is marked as present only after it has been kicked off
successfully, i.e. before the CPU is brought up, it is not
present. This patch try to mark CPU as present when it is detected
(either through MPS table or ACPI). If it can't be brought up
successfully, it will be marked as non-present again.  This change is
mainly for CPU hot-plug. As discussed, we'd take two step for physical
CPU hot-add. A CPU is firstly marked as present, and later will bring
as online.

Also, In smp_boot_cpus(), xen need only scan all present CPU, and no
need to loop from 0... NR_CPUS. With this change, the bios_cpu_apicid
is not needed anymore.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
16 years agoHypercall to expose physical CPU information.
Keir Fraser [Tue, 10 Nov 2009 13:03:42 +0000 (13:03 +0000)]
Hypercall to expose physical CPU information.

It also make some changes to current cpu online/offline logic:
1) Firstly, cpu online/offline will trigger a vIRQ to dom0 for status
changes notification.
2) It also add an interface to platform operation to online/offline
physical CPU. Currently the cpu online/offline interface is in sysctl,
which can't be triggered in kernel. With this change, it is possible
to trigger cpu online/offline in dom0 through sysfs interface.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
16 years agotools: Make build again on netbsd
Keir Fraser [Tue, 10 Nov 2009 13:01:09 +0000 (13:01 +0000)]
tools: Make build again on netbsd

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxl: Call to open() must specify mode with O_CREAT.
Keir Fraser [Mon, 9 Nov 2009 22:41:23 +0000 (22:41 +0000)]
libxl: Call to open() must specify mode with O_CREAT.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agounlzma: Remove 'inline' decl from non-static function.
Keir Fraser [Mon, 9 Nov 2009 22:30:21 +0000 (22:30 +0000)]
unlzma: Remove 'inline' decl from non-static function.

Breaks the build with some versions of gcc.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Fix clip_to_limit().
Keir Fraser [Mon, 9 Nov 2009 20:43:40 +0000 (20:43 +0000)]
x86: Fix clip_to_limit().

There are issues in updating the e820 map in the middle of a loop that
iterates over it. For example, after memmove(&e820.map[i],
&e820.map[i+1], ...), the original e820.map[i+1] become current
e820.map[i] but the next loop count is i+1, so the original
e820.map[i+1] will be skipped.

Fix and clarify the code by making a double loop.

Original bug discovery and fix by Xiao Guangrong <ericxiao.gr@gmail.com>

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agocmdline_parse_early: fix parse 'edd=' option
Keir Fraser [Mon, 9 Nov 2009 20:06:48 +0000 (20:06 +0000)]
cmdline_parse_early: fix parse 'edd=' option

If 'edd='is default, it should decrease "opt_edd" not "opt_edid"

Signed-off-by: Xiao Guangrong <ericxiao.gr@gmail.com>
16 years agoe820: fix e820_change_range_type()
Keir Fraser [Mon, 9 Nov 2009 20:05:43 +0000 (20:05 +0000)]
e820: fix e820_change_range_type()

In below case, e820_change_range_type() will return success:
[s, e] is in the middle of [rs, re] and e820->nr_map+1 >=
ARRAY_SIZE(e820->map) actually, it's failed, so this patch fix it

Signed-off-by: Xiao Guangrong <ericxiao.gr@gmail.com>
16 years agolibxenlight: initial libxenlight implementation under tools/libxl
Keir Fraser [Mon, 9 Nov 2009 19:54:28 +0000 (19:54 +0000)]
libxenlight: initial libxenlight implementation under tools/libxl

Signed-off-by: Vincent Hanquez <Vincent.Hanquez@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoblktap2: add remus driver
Keir Fraser [Mon, 9 Nov 2009 19:45:06 +0000 (19:45 +0000)]
blktap2: add remus driver

Blktap2 port of remus disk driver. Backwards compatable with blktap1
implementation.

Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Fixup for tap:tapdisk syntax in remus uname
Keir Fraser [Mon, 9 Nov 2009 19:41:16 +0000 (19:41 +0000)]
Remus: Fixup for tap:tapdisk syntax in remus uname

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoblktap2: only open driver stack once
Keir Fraser [Mon, 9 Nov 2009 19:40:48 +0000 (19:40 +0000)]
blktap2: only open driver stack once

Currently blktap2 opens a driver stack, closes it, and re-opens
it. This causes problems with our remus driver: the primary may
connect to the backup in between the first and second open.

This is a temporary fix.

Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
16 years agoblktap2: configurable driver chains
Keir Fraser [Mon, 9 Nov 2009 19:40:14 +0000 (19:40 +0000)]
blktap2: configurable driver chains

Blktap2 allows block device drivers to be layered to create more
advanced virtual block devices. However, composing a layered driver is
not exposed to the user. This patch fixes this, and allows the user to
explicitly specify a driver chain when starting a tapdisk process,
using the pipe character ('|') to explicitly seperate layers in a
blktap2 configuration string.

for example, the command:
  ~$ tapdisk2 -n "log:|aio:/path/to/file.img"
will create a blktap2 device where read and write requests are passed
to the 'log' driver, then forwarded to the 'aio' driver.

Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
16 years agoRemus: Make checkpoint buffering HVM-aware
Keir Fraser [Mon, 9 Nov 2009 19:19:27 +0000 (19:19 +0000)]
Remus: Make checkpoint buffering HVM-aware

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Do bitmap scan word-by-word before bit-by-bit.
Keir Fraser [Mon, 9 Nov 2009 19:17:22 +0000 (19:17 +0000)]
Remus: Do bitmap scan word-by-word before bit-by-bit.
For sparse bitmaps and large domains this saves a lot of time.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Do not bother with to_skip/to_fix bitmaps after the first final round.
Keir Fraser [Mon, 9 Nov 2009 19:16:48 +0000 (19:16 +0000)]
Remus: Do not bother with to_skip/to_fix bitmaps after the first final round.

Signed-off-by: Geoffrey Lefebvre <geoffrey@cs.ubc.ca>
16 years agoRemus: Buffer checkpoint data locally until domain has resumed execution.
Keir Fraser [Mon, 9 Nov 2009 19:16:19 +0000 (19:16 +0000)]
Remus: Buffer checkpoint data locally until domain has resumed execution.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Initiate failover if a packet is not received every 500ms.
Keir Fraser [Mon, 9 Nov 2009 19:15:34 +0000 (19:15 +0000)]
Remus: Initiate failover if a packet is not received every 500ms.

This breaks checkpoints at lower frequencies, and should be made
optional.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Make xc_domain_restore loop until the fd is closed.
Keir Fraser [Mon, 9 Nov 2009 19:14:03 +0000 (19:14 +0000)]
Remus: Make xc_domain_restore loop until the fd is closed.

The tail containing the final PFN table, VCPU contexts and
shared_info_page is buffered, then the read loop is restarted.
After the first pass, incoming pages are buffered until the next tail
is read, completing a new consistent checkpoint. At this point, the
memory changes are applied and the loop begins again. When the fd read
fails, the tail buffer is processed.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: Add callbacks for suspend, postcopy and preresume in xc_domain_save.
Keir Fraser [Mon, 9 Nov 2009 19:06:25 +0000 (19:06 +0000)]
Remus: Add callbacks for suspend, postcopy and preresume in xc_domain_save.

This makes it possible to perform repeated checkpoints.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agox86, hvm: Make host TscInvariant CPUID flag visible to guest by default.
Keir Fraser [Mon, 9 Nov 2009 18:54:27 +0000 (18:54 +0000)]
x86, hvm: Make host TscInvariant CPUID flag visible to guest by default.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86_32: Respect e820 map when populating Xen heap.
Keir Fraser [Mon, 9 Nov 2009 08:19:55 +0000 (08:19 +0000)]
x86_32: Respect e820 map when populating Xen heap.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86, cpuid: mask TSC invariant bit for PV and HVM domains if migration
Keir Fraser [Mon, 9 Nov 2009 08:03:30 +0000 (08:03 +0000)]
x86, cpuid: mask TSC invariant bit for PV and HVM domains if migration
is not disabled and TSC is not emulated

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86/dom0: support bzip2 and lzma compressed bzImage payloads
Keir Fraser [Mon, 9 Nov 2009 07:52:27 +0000 (07:52 +0000)]
x86/dom0: support bzip2 and lzma compressed bzImage payloads

This matches functionality in the tools already supporting the same
for DomU-s.

Code taken from Linux 2.6.32-rc and adjusted as little as possible to
be usable in Xen.

The question is whether, particularly for non-Linux Dom0-s, plain ELF
images compressed by bzip2 or lzma should also be supported.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoxentop: Add two more VBD statistics
Keir Fraser [Thu, 5 Nov 2009 12:00:58 +0000 (12:00 +0000)]
xentop: Add two more VBD statistics

In addition to VBD read/write request#, add VBD read/write sector#
also. It makes VBD throughput observation easier. As the method to get
such info is OS dependent, just Linux version code is added.

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agoxc_resume: fix modify_returncode when host width != guest width
Keir Fraser [Wed, 4 Nov 2009 22:32:01 +0000 (22:32 +0000)]
xc_resume: fix modify_returncode when host width != guest width

Also improve checking in xc_domain_resume_any().

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoUpdate QEMU_TAG to f72b6e0ffc3bb84d4442c5a7493bffbdce2a4468
Keir Fraser [Wed, 4 Nov 2009 18:14:02 +0000 (18:14 +0000)]
Update QEMU_TAG to f72b6e0ffc3bb84d4442c5a7493bffbdce2a4468

16 years agoxen passthrough: fix recent regressions
Keir Fraser [Tue, 3 Nov 2009 12:41:54 +0000 (12:41 +0000)]
xen passthrough: fix recent regressions

This patch fixes the recent regressions pointed out by Dexuan, keeping
pci passthrough working with stubdom too.  In particular calling
device_create when pci_state == 'Initialising' is a mistake because
the state is always Initialising when attaching a device while
device_create has too be called only when the pci backend is missing.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: improve reporting through XENMEM_machine_memory_map
Keir Fraser [Tue, 3 Nov 2009 12:40:28 +0000 (12:40 +0000)]
x86: improve reporting through XENMEM_machine_memory_map

Since Dom0 derives machine address ranges usable for assigning PCI
device resources from the output of this sub-hypercall, Xen should
make
sure it properly reports all ranges not suitable for this (as either
reserved or unusable):
- RAM regions excluded via command line option
- memory regions used by Xen itself (LAPIC, IOAPICs)

While the latter should generally already be excluded by the BIOS
provided E820 table, this apparently isn't always the case at least
for IOAPICs, and with Linux having got changed to account for this it
seems to make sense to also do so in Xen.

Generally the HPET range should also be excluded here, but since it
isn't being reflected in Dom0's iomem_caps (and can't be, as it's a
sub-page range) I wasn't sure whether adding explicit code for doing
so would be reasonable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: Clean up APIC local timer handling.
Keir Fraser [Tue, 3 Nov 2009 09:33:22 +0000 (09:33 +0000)]
x86: Clean up APIC local timer handling.

1. Writing TMICT=0 disables the timer. Use this fact to simplify and
improve reprogram_timer(). In particular, we always write TMICT, and
write zero when we do not need a timer interrupt.

2. In HPET broadcast timer handler, set TMICT=0 when we mask the APIC
local timer. May as well do this early, before entering deep sleep.

3. In HVM-guest APIC emulation, disable the emulated local timer when
the guest sets TMICT=0. Previously we would issue an immediate
one-shot interrupt.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovmx: Disable vPMU feature by default
Keir Fraser [Tue, 3 Nov 2009 08:40:40 +0000 (08:40 +0000)]
vmx: Disable vPMU feature by default

Signed-off-by: Shan Haitao <haitao.shan@intel.com>
16 years agoLinux vbd hotplug: Speed up finding a loopback device
Keir Fraser [Tue, 3 Nov 2009 08:39:21 +0000 (08:39 +0000)]
Linux vbd hotplug: Speed up finding a loopback device

 - Use the device and inode information provided by losetup to find
   if the vbd backing file is in use on another vbd.

 - Use losetup to find a free loopback device.

Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
16 years agoLinux vbd hotplug: Avoid "leaked" loopback devices
Keir Fraser [Tue, 3 Nov 2009 08:38:55 +0000 (08:38 +0000)]
Linux vbd hotplug: Avoid "leaked" loopback devices

Avoid races between hotplug "add" and "remove" leading to "leaked"
loopback devices.

- Don't setup loopback device if xend is no longer waiting for the
  vbd.
- Use the lock file to avoid add/remove races.

Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
16 years agoxen-hvmctx: add recently added gtsc_khz field to output
Keir Fraser [Tue, 3 Nov 2009 08:37:52 +0000 (08:37 +0000)]
xen-hvmctx: add recently added gtsc_khz field to output

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoFixes after addition of dummy_vcpu_info.
Keir Fraser [Mon, 2 Nov 2009 09:38:34 +0000 (09:38 +0000)]
Fixes after addition of dummy_vcpu_info.

 - Clean initialisation of new vcpu_info in map_vcpu_info() if the
 vcpu was previously using the shared dummy structure.
 - Don't allow a vcpu to run with teh shared dummy info structure, as
 no good can come of it.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoExtend the max vcpu number for HVM guest.
Keir Fraser [Thu, 29 Oct 2009 14:48:28 +0000 (14:48 +0000)]
Extend the max vcpu number for HVM guest.
 - Originally the max vcpu number for HVM guest is 32, this patch
 extend the number to 128 on x86_64 hypervisor. (For i386 hypervisor,
 the max vcpu number  is still 32).
 - This patch extends the mp-table size to fit more vcpus.
 - HVM PV driver should call VCPUOP_register_vcpu_info hypercall to
 initialize the vcpu info if the vcpu number is more than 32.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAMD IOMMU: remove a BUG_ON condition, to allow boot
Keir Fraser [Thu, 29 Oct 2009 14:05:46 +0000 (14:05 +0000)]
AMD IOMMU: remove a BUG_ON condition, to allow boot

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agostubdom: make stubdom-dm exit properly
Keir Fraser [Thu, 29 Oct 2009 14:04:45 +0000 (14:04 +0000)]
stubdom: make stubdom-dm exit properly

The built-in bash command wait should be able to take a pid argument
and just wait for the specified process to die, but it currently has a
bug and what actually does is waiting for the death of all the
children.  For this reason the stubdom-dm script doesn't exit properly
after stubdom destruction.  This patch solves the issue spawning only
one child, removing the sleep subprocess workaround that was used to
create a usable stdin for "xm console" and replacing it with a fifo.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoExtend max vcpu number for HVM guest
Keir Fraser [Thu, 29 Oct 2009 14:03:56 +0000 (14:03 +0000)]
Extend max vcpu number for HVM guest

Reduce size of Xen-qemu shared ioreq structure to 32 bytes. This
has two advantages:
 1. We can support up to 128 VCPUs with a single shared page
 2. If/when we want to go beyond 128 VCPUs, a whole number of ioreq_t
    structures will pack into a single shared page, so a multi-page
    array will have no ioreq_t straddling a page boundary

Also, while modifying qemu, replace a 32-entry vcpu-indexed array
with a dynamically-allocated array.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoUpdate .hgignore list
Keir Fraser [Thu, 29 Oct 2009 11:50:09 +0000 (11:50 +0000)]
Update .hgignore list

16 years agoPoint per-vcpu vcpu_info at a dummy structure by default, avoiding
Keir Fraser [Thu, 29 Oct 2009 11:14:54 +0000 (11:14 +0000)]
Point per-vcpu vcpu_info at a dummy structure by default, avoiding
need for scattered NULL-pointer checks.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agominios: xmalloc and realloc fixes
Keir Fraser [Thu, 29 Oct 2009 08:34:51 +0000 (08:34 +0000)]
minios: xmalloc and realloc fixes

 - xmalloc currently faults if xmalloc_new_page fails due to OOM
 - realloc treats xmalloc_hdr.size as the size of just the data region
   rather than the total size of data region + headers + padding.

From: James Pendergrass <James.Pendergrass@jhuapl.edu>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoiommu: Do not initialise global vars explicitly to zero.
Keir Fraser [Wed, 28 Oct 2009 17:27:47 +0000 (17:27 +0000)]
iommu: Do not initialise global vars explicitly to zero.

Unnecessary and prevents them being allocated in BSS rather than data.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>